Goto

Collaborating Authors

 probability score


These aren't AI firms, they're defense contractors. We can't let them hide behind their models

The Guardian

We can't let them hide behind their models From Gaza to Iran, the pattern is the same: precision weapons, chosen blindness, and dead children. There is an Israeli military strategy called the "fog procedure". First used during the second intifada, it's an unofficial rule that requires soldiers guarding military posts in conditions of low visibility to shoot bursts of gunfire into the darkness, on the theory that an invisible threat might be lurking. It's violence licensed by blindness. Shoot into the darkness and call it deterrence. With the dawn of AI warfare, that same logic of chosen blindness has been refined, systematized, and handed off to a machine.



SupplementaryMaterial

Neural Information Processing Systems

Using targeted attack strategy allows us to include the randomness of the target label sampling. We choose Jester dataset as it generally takes few queries to attack Jester dataset, thussavingtesting time. GEO-TRAP can employ different kinds of geometric transformations in theTRANS-WARP function. Thisisdemonstrated bythefactthatGEO-TRAP'sgradients generally have larger cosine similarity with the ground truth gradients. Wedenote the probability score associated with this label aspy(x).


DINOv3 as a Frozen Encoder for CRPS-Oriented Probabilistic Rainfall Nowcasting

Filho, Luciano Araujo Dourado, Neto, Almir Moreira da Silva, Miyaguchi, Anthony, David, Rodrigo Pereira, Calumby, Rodrigo Tripodi, Picek, Lukáš

arXiv.org Artificial Intelligence

This paper proposes a competitive and computationally efficient approach to probabilistic rainfall nowcasting. A video projector (V-JEPA Vision Transformer) associated to a lightweight probabilistic head is attached to a pre-trained satellite vision encoder (DINOv3-SAT493M) to map encoder tokens into a discrete empirical CDF (eCDF) over 4-hour accumulated rainfall. The projector-head is optimized end-to-end over the Ranked Probability Score (RPS). As an alternative, 3D-UNET baselines trained with an aggregate Rank Probability Score and a per-pixel Gamma-Hurdle objective are used. On the Weather4Cast 2025 benchmark, the proposed method achieved a promising performance, with a CRPS of 3.5102, which represents $\approx$ 26% in effectiveness gain against the best 3D-UNET.


Think, Remember, Navigate: Zero-Shot Object-Goal Navigation with VLM-Powered Reasoning

Habibpour, Mobin, Afghah, Fatemeh

arXiv.org Artificial Intelligence

While Vision-Language Models (VLMs) are set to transform robotic navigation, existing methods often underutilize their reasoning capabilities. To unlock the full potential of VLMs in robotics, we shift their role from passive observers to active strategists in the navigation process. Our framework outsources high-level planning to a VLM, which leverages its contextual understanding to guide a frontier-based exploration agent. This intelligent guidance is achieved through a trio of techniques: structured chain-of-thought prompting that elicits logical, step-by-step reasoning; dynamic inclusion of the agent's recent action history to prevent getting stuck in loops; and a novel capability that enables the VLM to interpret top-down obstacle maps alongside first-person views, thereby enhancing spatial awareness. When tested on challenging benchmarks like HM3D, Gibson, and MP3D, this method produces exceptionally direct and logical trajectories, marking a substantial improvement in navigation efficiency over existing approaches and charting a path toward more capable embodied agents.


What Do Temporal Graph Learning Models Learn?

Hayes, Abigail J., Schumacher, Tobias, Strohmaier, Markus

arXiv.org Artificial Intelligence

Learning on temporal graphs has become a central topic in graph representation learning, with numerous benchmarks indicating the strong performance of state-of-the-art models. However, recent work has raised concerns about the reliability of benchmark results, noting issues with commonly used evaluation protocols and the surprising competitiveness of simple heuristics. This contrast raises the question of which properties of the underlying graphs temporal graph learning models actually use to form their predictions. We address this by systematically evaluating seven models on their ability to capture eight fundamental attributes related to the link structure of temporal graphs. These include structural characteristics such as density, temporal patterns such as recency, and edge formation mechanisms such as homophily. Using both synthetic and real-world datasets, we analyze how well models learn these attributes. Our findings reveal a mixed picture: models capture some attributes well but fail to reproduce others. With this, we expose important limitations. Overall, we believe that our results provide practical insights for the application of temporal graph learning models, and motivate more interpretability-driven evaluations in temporal graph learning research.



What Scales in Cross-Entropy Scaling Law?

Yan, Junxi, Wei, Zixi, Zhan, Jingtao, Ai, Qingyao, Liu, Yiqun

arXiv.org Artificial Intelligence

The cross-entropy scaling law has long served as a key tool for guiding the development of large language models. It shows that cross-entropy loss decreases in a predictable power-law rate as the model size increases. However, recent evidence indicates that this law breaks down at very large scales: the loss decreases more slowly than expected, which causes significant trouble for developing large language models. In this paper, we hypothesize that the root cause lies in the fact that cross-entropy itself does not truly scale; instead, only one of its hidden components does. To investigate this, we introduce a novel decomposition of cross-entropy into three parts: Error-Entropy, Self-Alignment, and Confidence. We show both theoretically and empirically that this decomposition precisely captures the training dynamics and optimization objectives. Through extensive experiments on multiple datasets and 32 models spanning five orders of magnitude in size, we find that only error-entropy follows a robust power-law scaling, while the other two terms remain largely invariant. Moreover, error-entropy constitutes the dominant share of cross-entropy in small models but diminishes in proportion as models grow larger. This explains why the cross-entropy scaling law appears accurate at small scales but fails at very large ones. Our findings establish the error-entropy scaling law as a more accurate description of model behavior. We believe it will have wide applications in the training, understanding, and future development of large language models.